What is the impact of socio-economic factors on Criminality in France?

Author

Pierre Monari, Elwin Freudiger

Published

November 19, 2023

1. Introduction

1.1 Overview and motivations

This project will analyze the factors influencing Criminality in France. We chose this Subject because we are both french and criminality consistently ranks as one of the most important priority for the French population. In our study, we wanted to focus on the social and economic factors contributing to this issue. France is often regarded as one of the major world powers, and yet, it is clear that similarly to the rest of the world, the french population is feeling more and more insecure regarding crime. Crime is impacting the political landscape very much. This subject has also made it’s way into everyday discussions as it proves to be a polarizing topic. In the last thirty years, Politicians have often promised to tackle crime through several means, and as the population wants more to be done against it, the government has often made it one of their top priority.

In recent times, there has been a noticeable rise in criminal activities across France. Regions such as “Seine Saint Denis” in the Paris area. has seen emboldened criminals committing crimes in broad daylight, contributing to a perception of increased chaos and increasing the feeling of insecurity. Similar concerns have surfaced in cities like Marseille. Additionally, it’s worth noting that the prison population has reached an all-time high in July 2023. But what are the reasons for increased criminality?

The goal of our research is to synthesize the main reasons behind criminality. Medias have often presented various reasons to explain crime, our research will look at those hints and see if they may have any statistical significance. It should be noted that data regarding “Ethnic origin” is not available, this variable is sometimes singled out as one of the driver of criminality and unlike in the United States, France has outlawed any form of statistics on this subject. as such, we will mostly focus on socio-economics drivers behind crime. France is facing substantial socio-economics issues, unemployment, poverty, conditions of leaving, and school level. Our project explores the link between socio-economic factors and criminality rates. Establishing such a link would give a new outlook on crime and its causes. Additionally we would like to know whether, this data could be used by the government to address public safety. Moreover, what are the core human drivers that push humans to commit crimes? How their socio-economic environment and their educational path may impact their ability to commit crimes?

1.2 Associated papers

https://www.institutpourlajustice.org/content/2017/12/Pauvreté-et-délinquance.pdf

1.3 Research questions

The purpose of this study is to evaluate the impact of different socioeconomic factors on criminality in France. We are using for that a french government database which divide crimes in 14 categories:

  1. Destruction and intentional damages
  2. Thefts without violence against people
  3. Residential burglaries
  4. Assault and battery
  5. Thefts from vehicles
  6. Domestic assault and battery
  7. Theft of vehicle accessories
  8. Other intentional assault and battery
  9. Sexual violence
  10. Violent robberies without weapons
  11. Drug trafficking
  12. Robbery with weapons

We will consider the following macroeconomic variables:

  • Poverty rate,
  • Unemployment rate
  • School indicators(Success rate at an exam/Share of adults without a dimploma)
  • Density of Population
  • Results of previous presidential election.

These are the following question we will focus on:

  • Can the criminality rate in France be explained using the unemployment rate?
  • Can the criminality rate in France be explained using the poverty rate?
  • Can the criminality rate in France be explained using education indicators ?
  • Can the density of population in territories explain criminality?
  • Can the immigration rate explain criminality?
  • Can the previous presidential results in territories have an impact on criminality?
  • Is there a correlation between these mentioned variables?

2. Data Wrangling

In this section,we will look at our raw datasets, what information they give us, how we wrangle/clean them to arrive at our final dataset. Lastly we will look at the mistakes in our data and any anomalies or outliers we may find.

Firstly, we will use a dataset with values by french state. This dataset will not be used for our Regression analysis as only having 95 observations is not enough to have a good statistical analysis.

2.1 Raw datasets

2.1.1 Data by Department

Population

There is one excel sheet per year, starting in 1975 and ending in 2023

Estimation de population au 1er janvier, par département, sexe et grande classe d'âge ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...14 ...15 ...16 ...17 ...18 ...19 ...20
Année 2023 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Départements NA Ensemble NA NA NA NA NA Hommes NA NA NA NA NA Femmes NA NA NA NA NA
NA NA 0 à 19 ans 20 à 39 ans 40 à 59 ans 60 à 74 ans 75 ans et plus Total 0 à 19 ans 20 à 39 ans 40 à 59 ans 60 à 74 ans 75 ans et plus Total 0 à 19 ans 20 à 39 ans 40 à 59 ans 60 à 74 ans 75 ans et plus Total
01 Ain 170727 149960 183127 109363 58760 671937 88293 75548 90943 52486 24308 331578 82434 74412 92184 56877 34452 340359
02 Aisne 125684 113454 132390 98737 52526 522791 64688 57214 65872 47242 20699 255715 60996 56240 66518 51495 31827 267076

This dataset gives us information regarding the population inside the dataset, the population is also divided by gender and age groups. We will only use the total number of people per department. It shoudl be noted that what we call our department number is a Character. This is due to the Corsica departments, split in two in 1975 and named 2A and 2B, they used to department number 20.

Source: Population by state

Crime

There is one excel sheet per department.

Index libellé index _2022_08 _2022_07 _2022_06 _2022_05 _2022_04 _2022_03 _2022_02 _2022_01 _2021_12 _2021_11 _2021_10 _2021_09 _2021_08 _2021_07 _2021_06 _2021_05 _2021_04 _2021_03 _2021_02 _2021_01 _2020_12 _2020_11 _2020_10 _2020_09 _2020_08 _2020_07 _2020_06 _2020_05 _2020_04 _2020_03 _2020_02 _2020_01 _2019_12 _2019_11 _2019_10 _2019_09 _2019_08 _2019_07 _2019_06 _2019_05 _2019_04 _2019_03 _2019_02 _2019_01 _2018_12 _2018_11 _2018_10 _2018_09 _2018_08 _2018_07 _2018_06 _2018_05 _2018_04 _2018_03 _2018_02 _2018_01 _2017_12 _2017_11 _2017_10 _2017_09 _2017_08 _2017_07 _2017_06 _2017_05 _2017_04 _2017_03 _2017_02 _2017_01 _2016_12 _2016_11 _2016_10 _2016_09 _2016_08 _2016_07 _2016_06 _2016_05 _2016_04 _2016_03 _2016_02 _2016_01 _2015_12 _2015_11 _2015_10 _2015_09 _2015_08 _2015_07 _2015_06 _2015_05 _2015_04 _2015_03 _2015_02 _2015_01 _2014_12 _2014_11 _2014_10 _2014_09 _2014_08 _2014_07 _2014_06 _2014_05 _2014_04 _2014_03 _2014_02 _2014_01 _2013_12 _2013_11 _2013_10 _2013_09 _2013_08 _2013_07 _2013_06 _2013_05 _2013_04 _2013_03 _2013_02 _2013_01 _2012_12 _2012_11 _2012_10 _2012_09 _2012_08 _2012_07 _2012_06 _2012_05 _2012_04 _2012_03 _2012_02 _2012_01 _2011_12 _2011_11 _2011_10 _2011_09 _2011_08 _2011_07 _2011_06 _2011_05 _2011_04 _2011_03 _2011_02 _2011_01 _2010_12 _2010_11 _2010_10 _2010_09 _2010_08 _2010_07 _2010_06 _2010_05 _2010_04 _2010_03 _2010_02 _2010_01 _2009_12 _2009_11 _2009_10 _2009_09 _2009_08 _2009_07 _2009_06 _2009_05 _2009_04 _2009_03 _2009_02 _2009_01 _2008_12 _2008_11 _2008_10 _2008_09 _2008_08 _2008_07 _2008_06 _2008_05 _2008_04 _2008_03 _2008_02 _2008_01 _2007_12 _2007_11 _2007_10 _2007_09 _2007_08 _2007_07 _2007_06 _2007_05 _2007_04 _2007_03 _2007_02 _2007_01 _2006_12 _2006_11 _2006_10 _2006_09 _2006_08 _2006_07 _2006_06 _2006_05 _2006_04 _2006_03 _2006_02 _2006_01 _2005_12 _2005_11 _2005_10 _2005_09 _2005_08 _2005_07 _2005_06 _2005_05 _2005_04 _2005_03 _2005_02 _2005_01 _2004_12 _2004_11 _2004_10 _2004_09 _2004_08 _2004_07 _2004_06 _2004_05 _2004_04 _2004_03 _2004_02 _2004_01 _2003_12 _2003_11 _2003_10 _2003_09 _2003_08 _2003_07 _2003_06 _2003_05 _2003_04 _2003_03 _2003_02 _2003_01 _2002_12 _2002_11 _2002_10 _2002_09 _2002_08 _2002_07 _2002_06 _2002_05 _2002_04 _2002_03 _2002_02 _2002_01 _2001_12 _2001_11 _2001_10 _2001_09 _2001_08 _2001_07 _2001_06 _2001_05 _2001_04 _2001_03 _2001_02 _2001_01 _2000_12 _2000_11 _2000_10 _2000_09 _2000_08 _2000_07 _2000_06 _2000_05 _2000_04 _2000_03 _2000_02 _2000_01 _1999_12 _1999_11 _1999_10 _1999_09 _1999_08 _1999_07 _1999_06 _1999_05 _1999_04 _1999_03 _1999_02 _1999_01 _1998_12 _1998_11 _1998_10 _1998_09 _1998_08 _1998_07 _1998_06 _1998_05 _1998_04 _1998_03 _1998_02 _1998_01 _1997_12 _1997_11 _1997_10 _1997_09 _1997_08 _1997_07 _1997_06 _1997_05 _1997_04 _1997_03 _1997_02 _1997_01 _1996_12 _1996_11 _1996_10 _1996_09 _1996_08 _1996_07 _1996_06 _1996_05 _1996_04 _1996_03 _1996_02 _1996_01
1 Règlements de compte entre malfaiteurs 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Homicides pour voler et à l'occasion de vols 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Homicides pour d'autres motifs 0 3 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 2 0 0 0 1 0 0 0 0 1 0 4 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 2 1 1 0 0 1 0 2 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 2 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 1 0 0 1 0 0 2 1 0 0 0 1 0 2 0 0 0 2 0 2 0 0 0 0 0 2 0 0 0 1 1 0 1 0 0 0 1 0 0 0 6 0 2 0 2 0 0 3 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 2 1 0 0 1 0 0 4 0 0 2 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 2 0 1 0 0 2 1 0 0 1 2 1 0 0 1 0 0 2 0 0 1 2 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 1 1 1 0 1 0 3 0 0 0
4 Tentatives d'homicides pour voler et à l'occasion de vols 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
5 Tentatives homicides pour d'autres motifs 0 4 0 1 1 0 1 0 2 0 0 0 0 3 1 0 0 0 1 0 0 1 1 0 3 0 1 0 3 0 0 0 2 0 0 0 0 1 0 0 0 1 0 2 0 1 0 2 0 1 0 1 3 0 1 1 0 0 0 1 1 0 2 5 2 0 1 0 1 0 1 1 0 3 1 3 4 0 0 0 0 0 1 1 0 1 0 0 2 0 2 0 0 0 0 1 0 1 0 2 0 0 0 1 0 0 2 0 0 1 0 1 0 0 0 0 0 0 2 0 0 0 0 1 0 0 1 0 0 0 2 0 0 0 0 0 2 0 0 0 0 1 2 0 1 0 1 1 0 0 3 0 1 0 1 0 1 0 0 0 0 0 1 1 7 0 0 5 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 3 1 0 1 4 0 0 0 0 0 3 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 2 1 0 0 1 1 2 0 0 1 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 1 4 0 1 1 0 1 0 1 1 0 0 1 0 0 0 2 2 1 0 0 0 0 0 0 0 1 0 0 0 1 0 2 1 0 3 1 1 0 1 0 0 1 1 2 2 0 1 1 0 1 0 1 0 0 1 1 0 0 2 1 0 0 0 1 0 0 1 0 1 0
6 Coups et blessures volontaires suivis de mort 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 3 0 1 0 1 0 0

This dataset is our main one, it tells us for example, that in the department “01” in August 2022, there was 1 case of counterfeit currency. we will be able to calculate the total number of crimes happening each month by department.

Source: Crime by state

Unemployement

Libellé idBank Dernière.mise.à.jour Période X1982.T1 X1982.T2 X1982.T3 X1982.T4 X1983.T1 X1983.T2 X1983.T3 X1983.T4 X1984.T1 X1984.T2 X1984.T3 X1984.T4 X1985.T1 X1985.T2 X1985.T3 X1985.T4 X1986.T1 X1986.T2 X1986.T3 X1986.T4 X1987.T1 X1987.T2 X1987.T3 X1987.T4 X1988.T1 X1988.T2 X1988.T3 X1988.T4 X1989.T1 X1989.T2 X1989.T3 X1989.T4 X1990.T1 X1990.T2 X1990.T3 X1990.T4 X1991.T1 X1991.T2 X1991.T3 X1991.T4 X1992.T1 X1992.T2 X1992.T3 X1992.T4 X1993.T1 X1993.T2 X1993.T3 X1993.T4 X1994.T1 X1994.T2 X1994.T3 X1994.T4 X1995.T1 X1995.T2 X1995.T3 X1995.T4 X1996.T1 X1996.T2 X1996.T3 X1996.T4 X1997.T1 X1997.T2 X1997.T3 X1997.T4 X1998.T1 X1998.T2 X1998.T3 X1998.T4 X1999.T1 X1999.T2 X1999.T3 X1999.T4 X2000.T1 X2000.T2 X2000.T3 X2000.T4 X2001.T1 X2001.T2 X2001.T3 X2001.T4 X2002.T1 X2002.T2 X2002.T3 X2002.T4 X2003.T1 X2003.T2 X2003.T3 X2003.T4 X2004.T1 X2004.T2 X2004.T3 X2004.T4 X2005.T1 X2005.T2 X2005.T3 X2005.T4 X2006.T1 X2006.T2 X2006.T3 X2006.T4 X2007.T1 X2007.T2 X2007.T3 X2007.T4 X2008.T1 X2008.T2 X2008.T3 X2008.T4 X2009.T1 X2009.T2 X2009.T3 X2009.T4 X2010.T1 X2010.T2 X2010.T3 X2010.T4 X2011.T1 X2011.T2 X2011.T3 X2011.T4 X2012.T1 X2012.T2 X2012.T3 X2012.T4 X2013.T1 X2013.T2 X2013.T3 X2013.T4 X2014.T1 X2014.T2 X2014.T3 X2014.T4 X2015.T1 X2015.T2 X2015.T3 X2015.T4 X2016.T1 X2016.T2 X2016.T3 X2016.T4 X2017.T1 X2017.T2 X2017.T3 X2017.T4 X2018.T1 X2018.T2 X2018.T3 X2018.T4 X2019.T1 X2019.T2 X2019.T3 X2019.T4 X2020.T1 X2020.T2 X2020.T3 X2020.T4 X2021.T1 X2021.T2 X2021.T3 X2021.T4 X2022.T1 X2022.T2 X2022.T3 X2022.T4 X2023.T1 X2023.T2
Taux de chômage localisé par région - France métropolitaine 1515842 29/09/2023 12:00 NA 6.7 6.8 7.0 7.0 7.0 7.0 7.2 7.5 7.9 8.3 8.5 8.7 8.8 8.8 8.8 8.8 8.7 8.8 8.8 8.9 9.0 9.0 8.8 8.8 8.7 8.6 8.5 8.3 8.1 7.9 7.8 7.8 7.7 7.7 7.6 7.6 7.6 7.7 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.5 9.8 10.1 10.3 10.4 10.3 10.1 9.8 9.6 9.5 9.6 9.9 10.1 10.2 10.2 10.3 10.3 10.3 10.1 9.9 9.8 9.7 9.8 9.8 9.8 9.5 9.1 8.7 8.3 8.0 7.7 7.4 7.3 7.3 7.4 7.5 7.5 7.5 7.6 8.0 8.1 8.0 8.4 8.6 8.4 8.5 8.5 8.3 8.5 8.7 8.7 8.8 8.6 8.6 8.0 8.2 7.8 7.7 7.1 6.9 7.0 7.1 7.4 8.2 8.9 8.9 9.1 9.0 8.9 8.9 8.8 8.8 8.7 8.9 9.0 9.1 9.4 9.4 9.8 10.0 10.1 9.9 9.8 9.8 9.8 9.9 10.1 10.0 10.2 10.0 9.9 9.9 9.7 9.6 9.8 9.3 9.2 9.2 8.7 9.0 8.8 8.6 8.4 8.5 8.1 8.1 7.9 7.7 7.0 8.8 7.9 8.0 7.7 7.7 7.2 7.1 7.1 7.1 7.0 6.9 6.9
Codes NA NA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A SD
Taux de chômage localisé par région - Île-de-France 1515843 29/09/2023 12:00 NA 5.6 5.7 5.7 5.7 5.6 5.6 5.8 6.0 6.2 6.5 6.7 6.8 6.9 7.0 7.2 7.3 7.3 7.3 7.3 7.4 7.5 7.5 7.5 7.4 7.4 7.3 7.3 7.1 6.9 6.7 6.6 6.5 6.5 6.5 6.4 6.4 6.4 6.6 6.9 7.0 7.1 7.3 7.6 7.9 8.2 8.6 8.9 9.2 9.3 9.3 9.2 9.1 8.9 8.8 8.7 8.8 9.0 9.2 9.2 9.2 9.3 9.4 9.4 9.2 9.1 9.0 8.9 8.9 8.9 8.8 8.5 8.2 7.8 7.4 7.1 6.8 6.5 6.4 6.5 6.6 6.8 7.1 7.3 7.4 7.9 8.1 8.1 8.4 8.7 8.5 8.6 8.6 8.3 8.4 8.5 8.4 8.4 8.2 8.1 7.5 7.7 7.4 7.2 6.6 6.3 6.4 6.4 6.5 7.1 7.6 7.8 8.1 8.0 7.9 7.9 7.9 7.8 7.8 7.9 8.0 8.1 8.3 8.3 8.5 8.7 8.8 8.7 8.7 8.7 8.7 8.8 9.0 8.8 9.0 8.9 8.8 8.8 8.6 8.5 8.7 8.3 8.2 8.3 7.8 8.1 8.0 7.8 7.7 7.6 7.4 7.4 7.2 7.0 6.3 8.3 7.7 7.8 7.5 7.6 7.1 7.0 7.0 6.9 6.8 6.6 6.7
Codes NA NA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A SD
Taux de chômage localisé par région - Centre-Val de Loire 1515847 29/09/2023 12:00 NA 5.7 5.7 5.9 6.0 6.0 6.0 6.3 6.7 7.1 7.4 7.7 8.0 8.0 8.0 8.1 8.0 8.1 8.2 8.1 8.3 8.4 8.4 8.2 8.0 7.9 7.7 7.6 7.4 7.2 7.0 6.8 6.8 6.6 6.4 6.4 6.4 6.4 6.5 6.8 7.0 7.3 7.5 7.6 7.8 8.0 8.4 8.7 9.0 9.2 9.2 9.2 8.9 8.6 8.3 8.1 8.3 8.6 8.8 9.0 9.1 9.2 9.0 8.9 8.7 8.5 8.4 8.3 8.3 8.3 8.2 7.9 7.5 7.1 6.7 6.5 6.2 6.0 5.9 5.9 6.1 6.1 6.2 6.3 6.4 6.7 7.0 7.0 7.4 7.6 7.4 7.5 7.6 7.4 7.5 7.7 7.7 7.8 7.6 7.6 7.1 7.1 6.7 6.7 6.2 5.9 6.1 6.3 6.6 7.5 8.1 8.1 8.4 8.4 8.3 8.3 8.2 8.1 8.0 8.3 8.4 8.5 8.8 9.0 9.3 9.5 9.6 9.5 9.4 9.4 9.5 9.6 9.8 9.6 9.8 9.7 9.6 9.7 9.5 9.4 9.5 9.1 9.0 8.9 8.4 8.7 8.5 8.5 8.3 8.4 8.0 8.0 7.8 7.4 6.8 8.2 7.2 7.4 7.2 7.2 6.7 6.8 6.9 6.8 6.7 6.6 6.7
Codes NA NA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A SD

Source: Unemployement by state

Middle school final exam

Session Numero.d.etablissement Type.d.etablissement Patronyme Secteur.d.enseignement Commune Libellé.commune Code.département Libellé.département Code.académie Libellé.académie Code.région Libellé.région Inscrits Presents Admis Admis.sans.mention Nombre_d_admis_Mention_AB Admis.Mention.bien Admis.Mention.très.bien Taux.de.réussite
2008 0381810L COLLEGE GERARD PHILIPE PUBLIC 38169 FONTAINE 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 97 94 57 20 20 16 1 60,60%
2008 0381818V LYCEE PROFESSIONNEL - PUBLIC 38563 VOIRON 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 19 19 19 13 5 1 0 100,00%
2008 0381888W LYCEE PROFESSIONNEL - PUBLIC 38509 LA TOUR-DU-PIN 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 24 24 22 11 8 3 0 91,60%
2008 0382099A LYCEE ROGER DESCHAUX PUBLIC 38474 SASSENAGE 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 24 22 11 4 7 0 0 50,00%
2008 0382100B COLLEGE GRESIVAUDAN PUBLIC 38397 SAINT-ISMIER 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 193 190 174 38 57 45 34 91,50%
2008 0382104F COLLEGE LOUIS ARAGON PUBLIC 38553 VILLEFONTAINE 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 90 86 53 33 9 8 3 61,60%

At the end of middle school in France, sutdents must complete an exam and if they succed, the students will receive their “brevet des collèges” and can continue to high school. This would be similar to the “Certificat” in Switzerland. Here this database gives us results for each middle school in France. The first column is exam year. Column 2 to 13 tell us where and what type of school it was. After column 13, we find how many students registered, how many came and how many students obtained the “Brevet”. Lastly, we get more information regarding how well the students did. A “mention” can be obtained if the students acheive a certain average grade. Lastly, the success rate is calculated by dividing the number of students who obtained the diploma by the number of students who came the day of the exam. We will not use this sucess rate as we believe that computing the success rate by diving students who obtained the degree with registered students would give us more information regarding education levels in France.

Source: Middle school exam results by state

Election Results

Code du département Libellé du département Inscrits Abstentions % Abs/Ins Votants % Vot/Ins Blancs % Blancs/Ins % Blancs/Vot Nuls % Nuls/Ins % Nuls/Vot Exprimés % Exp/Ins % Exp/Vot Sexe...17 Nom...18 Prénom...19 Voix...20 % Voix/Ins...21 % Voix/Exp...22 Sexe...23 Nom...24 Prénom...25 Voix...26 % Voix/Ins...27 % Voix/Exp...28
1 Ain 415950 93130 22.39 322820 77.61 28852 6.94 8.94 8738 2.10 2.71 285230 68.57 88.36 M MACRON Emmanuel 173809 41.79 60.94 F LE PEN Marine 111421 26.79 39.06
2 Aisne 375791 90745 24.15 285046 75.85 22838 6.08 8.01 9067 2.41 3.18 253141 67.36 88.81 M MACRON Emmanuel 119202 31.72 47.09 F LE PEN Marine 133939 35.64 52.91
3 Allier 253479 59294 23.39 194185 76.61 18877 7.45 9.72 8522 3.36 4.39 166786 65.80 85.89 M MACRON Emmanuel 106579 42.05 63.90 F LE PEN Marine 60207 23.75 36.10
4 Alpes-de-Haute-Provence 126459 29255 23.13 97204 76.87 9671 7.65 9.95 3722 2.94 3.83 83811 66.28 86.22 M MACRON Emmanuel 48994 38.74 58.46 F LE PEN Marine 34817 27.53 41.54
5 Hautes-Alpes 109892 24895 22.65 84997 77.35 8670 7.89 10.20 2699 2.46 3.18 73628 67.00 86.62 M MACRON Emmanuel 47211 42.96 64.12 F LE PEN Marine 26417 24.04 35.88
6 Alpes-Maritimes 761780 198631 26.07 563149 73.93 47784 6.27 8.49 12414 1.63 2.20 502951 66.02 89.31 M MACRON Emmanuel 278407 36.55 55.35 F LE PEN Marine 224544 29.48 44.65

We take this sheet for the second round of election results in 2017. This second round opposed Emmanuel Macron to Marine Lepen. In this, dataset, we have the number of citizen with voting rights, how many voted, blank ballots, invalid ballots and ballots for each candidates. We also have some basic information on the candidate, such as name and gender. Lastly we also have various ratios, with one being of great interest, the percentage of votes that Marine Lepen received. We decided to take the 2017 election because with our year being 2019, seeing the “impact” of the last election on crime could yield interesting results.

Source: 2017 Election results by state

Immigration

Immigrés et étrangers par département français ...2 ...3 ...4 ...5 ...6 ...7
NA NA NA NA NA NA NA
NA Immigrés Non immigrés Ensemble Etrangers Français Ensemble
Ain 78968 573464 652432 592775 59657 652432
Aisne 25414 505931 531345 512545 18800 531345
Allier 17423 318552 335975 324064 11911 335975
Alpes-de-Haute-Provence 13309 150999 164308 155227 9081 164308

Here we take this sheet and look at the number of immigrants by state. According to the documentation, an immigrant is a person born outside of France and living in France, while a foreigner is someone who does not have the french nationality. It should be noted that the label for foreigners and French citizens appears to have been switched in the dataframe. Fortunately, we will only use the number of immigrants for our study.

Source: Immigration numbers by departement

Population Density

Figure 3 - Densité de population en 2019 ...2 ...3
NA NA nombre d'habitants au km²
Département NA Densité
01 Ain 113
02 Aisne 72
03 Allier 46
04 Alpes-de-Haute-Provence 24

Source: Population Density by state

2.1.2 Data by town

Crime

CODGEO_2023 annee classe unité.de.compte valeur.publiée faits tauxpourmille complementinfoval complementinfotaux POP millPOP LOG millLOG
01001 16 Autres coups et blessures volontaires victime ndiff NA NA 1,4139534883720930 0,8994774745986377 767 16 3,48000000000000e+02 16
01001 17 Autres coups et blessures volontaires victime ndiff NA NA 1,4744186046511627 0,9308908948886161 776 17 3,53730434782610e+02 17
01001 18 Autres coups et blessures volontaires victime ndiff NA NA 1,6744186046511629 1,0494250025506857 771 18 3,53730434782608e+02 18
01001 19 Autres coups et blessures volontaires victime diff 0 0,00000000000000e+00 NA NA 779 19 3,59574601708852e+02 19
01001 20 Autres coups et blessures volontaires victime ndiff NA NA 1,4870689655172413 0,9798018812196120 806 20 3,66185064822065e+02 20
01001 21 Autres coups et blessures volontaires victime ndiff NA NA 1,7098214285714286 1,1468851437948424 806 20 3,66185064822065e+02 20
01001 22 Autres coups et blessures volontaires victime ndiff NA NA 1,8088888888888890 1,2238135473453471 806 20 3,66185064822065e+02 20
01004 16 Autres coups et blessures volontaires victime diff 39 2,76968965272353e+00 NA NA 14081 16 7,12611602800044e+03 16
01004 17 Autres coups et blessures volontaires victime diff 38 2,70751692198076e+00 NA NA 14035 17 7,21711239192120e+03 17
01004 18 Autres coups et blessures volontaires victime diff 46 3,23852435933540e+00 NA NA 14204 18 7,34158489570647e+03 18

This dataset is quite big with {r}nrow(Crimes). The first column is the CODGEO, this is a towncode by the INSEE(French institute of statistics and economics studies) the first 2 numbers are the department that the town is located in and the last 3 numbers are the town number. This first column will be very useful because it is a unique number for each town and we will use it as our key. The second column is our year, the third is the type of crime commited. The fourth column “valeur.publiée” tells us if the town publishes their crime statistics or not. It should be noted that some town are coded as publishing although the value is missing. We will dive into this later in the EDA part of our report. We then have a column telling us the total number of crimes and one for crime rate for a thousand people. The remaining columns will not be used in our analysis. The remaining LOG variables are for households in the town and the mill is a “millésime” method of census that allows a census to be conducted every year at a smaller scale, this is useful to track changes in the population year by year, but it will not be used by us.

Lastly, the rules for publishing data are the following: crime data is only published if the town records more than 5 instances of crime for 3 consecutive years. This is done to protect the privacy of the people involved and because the statistical analysis of such small data would not yield satisfying results.

Source: Crime by town

Density

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27
1 01 ozan OZAN ozan Ozan O250 OSN 01190 284 01284 2 26 6 618 469 500 93 6.60 4.91667 46.3833 2866 51546 +45456 462330 170 205
2 01 cormoranche-sur-saone CORMORANCHE-SUR-SAONE cormoranche sur saone Cormoranche-sur-Saône C65652625 KRMRNXSRSN 01290 123 01123 2 27 6 1058 903 1000 107 9.85 4.83333 46.2333 2772 51379 +44953 461427 168 211
3 01 plagne-01 PLAGNE plagne Plagne P425 PLKN 01130 298 01298 4 03 6 129 83 100 20 6.20 5.73333 46.1833 3769 51324 +54342 461131 560 922
4 01 tossiat TOSSIAT tossiat Tossiat T230 TST 01250 422 01422 2 25 6 1406 1111 1400 138 10.17 5.31667 46.1333 3309 51268 +51854 460828 244 501
5 01 pouillat POUILLAT pouillat Pouillat P430 PLT 01250 309 01309 2 33 6 88 58 100 14 6.23 5.43333 46.3333 3435 51475 +52542 461938 333 770
6 01 torcieu TORCIEU torcieu Torcieu T620 TRS 01230 421 01421 1 28 6 698 643 700 65 10.72 5.40000 45.9167 3398 51025 +52343 455521 257 782

This dataframe contains various information regarding french towns. Here, can be found the size of each town, the population but also the altitude for example. Here we will use the population in 2010 and the size to compute the density.

Source: Density by town

Population by town

NIVGEO CODGEO LIBGEO SEXE AGEPYR10 NB
COM 01001 L'Abergement-Clémenciat 1 00 14.812696
COM 01001 L'Abergement-Clémenciat 2 00 13.905300
COM 01001 L'Abergement-Clémenciat 1 03 10.848149
COM 01001 L'Abergement-Clémenciat 2 03 6.856037
COM 01001 L'Abergement-Clémenciat 1 06 23.655840
COM 01001 L'Abergement-Clémenciat 2 06 32.462397

This dataset is quite html, we have a column indicating the level, here is is towns, the CODGEO mentioned above, the town name, followed by the gender and age group. finally we have the number of inhabitants that fit these criterias.

Source: Density by town

Population by diploma

Région en géographie courante Département en géographie courante Commune en géographie courante Indicateur de stabilité de la commune Département en géographie 2021 Libellé de commune Aucun diplôme Hommes 16 à 24 ans RP2019 Aucun diplôme Hommes 25 ans ou plus RP2019 Aucun diplôme Femmes 16 à 24 ans RP2019 Aucun diplôme Femmes 25 ans ou plus RP2019 Diplôme de niveau CEP Hommes 16 à 24 ans RP2019 Diplôme de niveau CEP Hommes 25 ans ou plus RP2019 Diplôme de niveau CEP Femmes 16 à 24 ans RP2019 Diplôme de niveau CEP Femmes 25 ans ou plus RP2019 Diplôme de niveau BEPC Hommes 16 à 24 ans RP2019 Diplôme de niveau BEPC Hommes 25 ans ou plus RP2019 Diplôme de niveau BEPC Femmes 16 à 24 ans RP2019 Diplôme de niveau BEPC Femmes 25 ans ou plus RP2019 Diplôme de niveau CAP-BEP Hommes 16 à 24 ans RP2019 Diplôme de niveau CAP-BEP Hommes 25 ans ou plus RP2019 Diplôme de niveau CAP-BEP Femmes 16 à 24 ans RP2019 Diplôme de niveau CAP-BEP Femmes 25 ans ou plus RP2019 Diplôme de niveau bac général ou technique Hommes 16 à 24 ans RP2019 Diplôme de niveau bac général ou technique Hommes 25 ans ou plus RP2019 Diplôme de niveau bac général ou technique Femmes 16 à 24 ans RP2019 Diplôme de niveau bac général ou technique Femmes 25 ans ou plus RP2019 Diplôme universitaire de 1er cycle Hommes 16 à 24 ans RP2019 Diplôme universitaire de 1er cycle Hommes 25 ans ou plus RP2019 Diplôme universitaire de 1er cycle Femmes 16 à 24 ans RP2019 Diplôme universitaire de 1er cycle Femmes 25 ans ou plus RP2019 Diplôme universitaire de 2ème ou 3ème cycle Hommes 16 à 24 ans RP2019 Diplôme universitaire de 2ème ou 3ème cycle Hommes 25 ans ou plus RP2019 Diplôme universitaire de 2ème ou 3ème cycle Femmes 16 à 24 ans RP2019 Diplôme universitaire de 2ème ou 3ème cycle Femmes 25 ans ou plus RP2019
84 01 001 1 01 Abergement-Clémenciat 0.00000 31.273728 0.00000 26.319005 0 36.504517 0 42.168740 4.820431 25.195300 0.00000 25.89721 0.000000 72.04280 10.047479 66.528046 9.909446 51.11133 4.954723 50.93525 10.18551 25.62489 0.000000 35.40368 0.00000 45.71589 0.000000 30.433781
84 01 002 1 01 Abergement-De-Varey 0.00000 0.000000 0.00000 5.532044 0 5.137428 0 10.402373 0.000000 5.201186 0.00000 0.00000 0.000000 31.54036 0.000000 10.466922 0.000000 31.27326 0.000000 26.13662 0.00000 15.73425 0.000000 10.59682 0.00000 10.53227 0.000000 21.129095
84 01 003 0 01 Amareins NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
84 01 004 1 01 Ambérieu-En-Bugey 76.83123 643.204532 35.46438 860.735124 0 209.969479 0 432.813226 42.914918 207.238203 52.61075 355.55758 181.414957 1382.33326 122.867362 1245.913073 156.889261 822.49606 188.344492 832.16467 50.77483 638.97964 50.046583 536.94747 39.28065 593.41119 47.035996 837.011266
84 01 005 1 01 Ambérieux-En-Dombes 0.00000 83.521324 0.00000 44.217172 0 19.652076 0 63.869248 9.826038 39.304153 0.00000 49.13019 4.913019 240.73793 4.913019 167.042649 4.913019 88.43434 9.826038 152.30359 0.00000 54.04321 4.913019 93.34736 0.00000 93.34736 4.913019 83.521324
84 01 006 1 01 Ambléon 0.00000 5.090909 0.00000 0.000000 0 0.000000 0 5.090909 0.000000 0.000000 0.00000 0.00000 0.000000 15.27273 0.000000 5.090909 0.000000 20.36364 5.090909 25.45455 0.00000 0.00000 0.000000 0.00000 0.00000 0.00000 0.000000 5.090909

In this dataset, column 1 to 6 are the location of the town, with indicators such as the region, the departement and the town number. Then the population is divided by gender and by diploma obtained, we will only focus on the total population over 25 who doesn’t have any diploma. The whole excel files gives us this information for every year starting in 1968. We only take the year 2019.

Source: Population by diploma by town

2017 Election results

Résultats par communes - Tour 2 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...14 ...15 ...16 ...17 ...18 ...19 ...20 ...21 ...22 ...23 ...24 ...25 ...26 ...27 ...28 ...29 ...30 ...31 ...32
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Code du département Libellé du département Code de la commune Libellé de la commune Inscrits Abstentions % Abs/Ins Votants % Vot/Ins Blancs % Blancs/Ins % Blancs/Vot Nuls % Nuls/Ins % Nuls/Vot Exprimés % Exp/Ins % Exp/Vot N°Panneau Sexe Nom Prénom Voix % Voix/Ins % Voix/Exp N°Panneau Sexe Nom Prénom Voix % Voix/Ins % Voix/Exp
1 Ain 1 L'Abergement-Clémenciat 598 100 16.719999999999999 498 83.280000000000001 37 6.1900000000000004 7.4299999999999997 8 1.3400000000000001 1.6100000000000001 453 75.75 90.959999999999994 1 M MACRON Emmanuel 272 45.479999999999997 60.039999999999999 2 F LE PEN Marine 181 30.27 39.960000000000001
1 Ain 2 L'Abergement-de-Varey 209 32 15.31 177 84.689999999999998 21 10.050000000000001 11.859999999999999 6 2.8700000000000001 3.3900000000000001 150 71.769999999999996 84.75 1 M MACRON Emmanuel 93 44.5 62 2 F LE PEN Marine 57 27.27 38
1 Ain 4 Ambérieu-en-Bugey 8586 2312 26.93 6274 73.069999999999993 601 7 9.5800000000000001 217 2.5299999999999998 3.46 5456 63.549999999999997 86.959999999999994 1 M MACRON Emmanuel 3213 37.420000000000002 58.890000000000001 2 F LE PEN Marine 2243 26.120000000000001 41.109999999999999

This dataset is very similar to the one for departements.

Source: 2017 Election results by town

Poverty by town

CODGEO TP4019 TP5019 TP6019 TP60Q219 TP60IP19 AGE1TP4019 AGE1TP5019 AGE1TP6019 AGE1TP60Q219 AGE1TP60IP19 AGE2TP4019 AGE2TP5019 AGE2TP6019 AGE2TP60Q219 AGE2TP60IP19 AGE3TP4019 AGE3TP5019 AGE3TP6019 AGE3TP60Q219 AGE3TP60IP19 AGE4TP4019 AGE4TP5019 AGE4TP6019 AGE4TP60Q219 AGE4TP60IP19 AGE5TP4019 AGE5TP5019 AGE5TP6019 AGE5TP60Q219 AGE5TP60IP19 AGE6TP4019 AGE6TP5019 AGE6TP6019 AGE6TP60Q219 AGE6TP60IP19 TME1TP4019 TME1TP5019 TME1TP6019 TME1TP60Q219 TME1TP60IP19 TME2TP4019 TME2TP5019 TME2TP6019 TME2TP60Q219 TME2TP60IP19 TME3TP4019 TME3TP5019 TME3TP6019 TME3TP60Q219 TME3TP60IP19 TME4TP4019 TME4TP5019 TME4TP6019 TME4TP60Q219 TME4TP60IP19 TME5TP4019 TME5TP5019 TME5TP6019 TME5TP60Q219 TME5TP60IP19 TOL1TP4019 TOL1TP5019 TOL1TP6019 TOL1TP60Q219 TOL1TP60IP19 TOL2TP4019 TOL2TP5019 TOL2TP6019 TOL2TP60Q219 TOL2TP60IP19 TLD2TP4019 TLD2TP5019 TLD2TP6019 TLD2TP60Q219 TLD2TP60IP19 TLD3TP4019 TLD3TP5019 TLD3TP6019 TLD3TP60Q219 TLD3TP60IP19 TYM1TP4019 TYM1TP5019 TYM1TP6019 TYM1TP60Q219 TYM1TP60IP19 TYM2TP4019 TYM2TP5019 TYM2TP6019 TYM2TP60Q219 TYM2TP60IP19 TYM3TP4019 TYM3TP5019 TYM3TP6019 TYM3TP60Q219 TYM3TP60IP19 TYM4TP4019 TYM4TP5019 TYM4TP6019 TYM4TP60Q219 TYM4TP60IP19 TYM5TP4019 TYM5TP5019 TYM5TP6019 TYM5TP60Q219 TYM5TP60IP19 TYM6TP4019 TYM6TP5019 TYM6TP6019 TYM6TP60Q219 TYM6TP60IP19 OPR1TP4019 OPR1TP5019 OPR1TP6019 OPR1TP60Q219 OPR1TP60IP19 OPR2TP4019 OPR2TP5019 OPR2TP6019 OPR2TP60Q219 OPR2TP60IP19 OPR3TP4019 OPR3TP5019 OPR3TP6019 OPR3TP60Q219 OPR3TP60IP19 OPR4TP4019 OPR4TP5019 OPR4TP6019 OPR4TP60Q219 OPR4TP60IP19 OPR5TP4019 OPR5TP5019 OPR5TP6019 OPR5TP60Q219 OPR5TP60IP19
01004 16 22 28 7980 0.385 17 23 30 7950 0.388 23 31 37 7500 0.423 19 28 35 8060 0.379 14 19 23 7180 0.447 NA 10 15 9330 0.281 NA NA NA NA NA 12 17 22 7750 0.404 9 12 16 7930 0.389 13 20 26 8430 0.351 14 21 27 8420 0.352 34 45 53 7560 0.418 5 6 9 9460 0.271 26 35 43 7520 0.421 37 49 58 7170 0.448 14 19 28 8580 0.34 NA NA 20 7030 0.459 NA 18 24 8430 0.351 NA NA 9 9160 0.295 17 24 31 8280 0.362 31 40 49 6920 0.467 NA NA NA NA NA 12 18 25 8960 0.310 67 NA NA 5980 0.539 NA NA NA NA NA 8 12 17 8760 0.326 NA NA NA NA NA
01007 NA 9 13 9400 0.276 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 38 8990 0.308 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12 10020 0.229 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01014 8 13 19 9390 0.277 NA NA NA NA NA NA NA NA NA NA NA NA 25 9690 0.254 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 34 9430 0.274 NA NA 10 9790 0.246 17 27 37 8920 0.313 NA 30 41 9050 0.303 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 13 20 9480 0.270 NA NA NA NA NA NA NA NA NA NA NA 10 17 10010 0.229 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01024 NA 8 12 9430 0.274 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 10000 0.230 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 8880 0.316 NA NA NA NA NA NA NA NA NA NA NA NA 11 10410 0.198 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01025 5 7 11 9020 0.305 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 9760 0.249 NA NA 34 8310 0.360 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 8400 0.353 NA NA NA NA NA NA NA NA NA NA NA NA 9 9690 0.254 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01031 17 23 29 7930 0.389 NA NA NA NA NA NA 37 44 7660 0.410 23 32 40 8020 0.383 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 35 47 56 7310 0.437 NA NA NA NA NA 28 37 47 7850 0.396 30 41 52 7920 0.390 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 19 26 34 8290 0.361 NA NA NA 6640 0.489 NA NA NA NA NA 13 20 28 8780 0.324 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

This dataset does not have easily understable column names. The first is CODGEO, the town code. The columns starting with “TP” indicate poverty rates. the number afterwards indicate the rate at which people are considered poor. For example, with 60% which is the commonly used cutoff for poverty, the number means that a certain percentage of people earn only 60% of the median revenue in France. The column “TP60IP19” is the poverty intensity. basically how far from the median revenue are those poor people. We will also use this infomarmation. The remaining columns divide those values by age

Source: Poverty by town

2.2 Wrangling

2.2.1 By department

List of departements.

We create a list of states with their numbers using the population dataset.

List_depart <- List_depart[5:100, 1:2]
colnames(List_depart) <- c("Dep_number", "Dep_name")

knitr::kable(head(List_departe, n=3), "html")

To do this, we select our states and only the name and state code from our Population dataset. For reasons stated above, we only select metropolitan states. This means only selecting the first 96 states. We then rename our columns with easy to remember names that will be used as our key in the future.

Population by year and department

We will create 2 datasets.

years_pop <- as.character(1975:2023) #every year we need to extract a sheet
Pop_by_year <- List_depart[1] #list with depart numbers 
Col_name <-  paste0("pop_", years_pop) #List to rename the columns


for (i in 1:length(years_pop)) {
  pop <- 
    read_excel(here::here("Raw_data/population par département.xls"), 
               sheet = years_pop[i]) 
  Pop_by_year <- cbind(Pop_by_year, pop[5:100, 8])#add them to the df
  colnames(Pop_by_year)[i+1] <- Col_name[i] #rename cols
}

our first dataset is the population by years. To create this, we create a vector of all the years present in the excel file. using a for loop, we open each excel sheet for each year, add them to our list of departments dataset created previously and rename it to a standard name. This dataset will be useful for our graphical representations.

we then, we take this dataset and only keep the population in 2019 and the department number. We will use this for our clustering analysis.

Pop_2019 <- Pop_by_year |> 
  select(Dep_number, pop_2019)
View(Pop_2019)

Crime

years_crime <- 1996:2021
Crime_total <- t(data.frame(years_crime))
colnames(Crime_total) <- Crime_total[1,]

view(Crime_total)

for (i in 1:nrow(List_depart)) {
  crimes_depart <- read_excel(here::here("Raw_data/crimes_depart.xlsx"), 
                              sheet = List_depart$Dep_number[i]) 
  #Load each datasheet with the department list
  
  crimes_depart <- crimes_depart |>
    select(-(1:10)) |> 
    summarise_all(sum) |>
    pivot_longer(cols = everything(), names_to = "Year", values_to = "Dep_number") |>
    mutate(Year = substr(Year, 2, 5)) |>
    group_by(Year) |>
    summarize(Dep_number = sum(Dep_number)) |> t()
  
  Crime_total <- rbind(Crime_total, crimes_depart[2,])
}

Crime_total <-  Crime_total[-1, ]
Crime_total <- cbind(List_depart[1], Crime_total)

We create a total number of crimes committed by department and by year.

Crime_2019 <- Crime |>
  select(Dep_number, `2019`) |>
  rename(Crime_tot_2019 = `2019`)

Crime_2019 <- full_join(Crime_2019, Pop_2019, join_by("Dep_number"))
View(Crime_2019)

Crime_2019 <- Crime_2019 |>
  mutate(Crime_rate_1k = (Crime_tot_2019/pop_2019)*1000)

We then only select the year 2019 and compute the crime rate per thousand people. To do this we must add pur previously created population in 2019 dataframe.

Unemployment

This dataset is a bit trickier.

dummy <- seq_len(nrow(Unemployement)) %% 2 #Dummy variable equal to 1 if the row is an odd number
Unemployement <- cbind(dummy, Unemployement)

Unemployement <- Unemployement |>
  filter(dummy == 1) |>
  select(-c(1, 3,4,5))
Unemployement <-  Unemployement[-(1:15),] # only select departments

Unemployement <- Unemployement |> 
  separate(Libellé, into = c("X", "Departement") ,sep=" - ") |>
  select(-1) 

Unemployement <- Unemployement[-(97:100), ]

First we need to remove every odd number row as they do not provide us with any information. We create a dummy variable that equals 1 if the row is an odd number, we then only select our rows with data. We then only keep the department name from the department name. We then only select metropolitan departments

Unemployment_quarterly <- Unemployement |>
  pivot_longer(cols = -Dep_name, names_to = "Year", values_to = "Total") |>
  mutate(Total = as.numeric(Total)) |>
  mutate(Year = substr(Year, 2, 8)) |>
  group_by(Dep_name, Year) |>
  summarize(Total= mean(Total)) |>
  pivot_wider(names_from = Year, values_from = Total)

Unemployment_quarterly <- full_join(List_depart, Unemployment_quarterly, join_by(Dep_name))

We now need to compute the quarterly unemployment rate. To do this we use pivot_longer to be able to summarize our data by year and department. we then compute a mean unemployment rate for each year and rearrange our departments names to be inline with our list of departments using the full_join() function.

Unemployment_year <- Unemployment_quarterly |>
  pivot_longer(cols = -(Dep_number:Dep_name), names_to = "Year", values_to = "Total") |>
  mutate(Year = substr(Year, 1, 4)) |>
  group_by(Dep_number, Dep_name, Year) |>
  summarize(Total = mean(Total)) |>
  pivot_wider(names_from = Year, values_from = Total)

We now compute the Unemployment by year by computing the mean of each quarter.

Unemp <- Unemployment_year |> 
  select(Departement, `2019`) |>
  rename(Dep_name = Departement,
         Unemp_2019 = `2019`)

And we select only the year 2019 for our clustering and PCA analysis.

Middle school results

Middle <- Middle_results |>
  filter(Session == 2019) |>
  select(Code.département, Admis, Inscrits) |>
  mutate(Pass_rate = (Admis/Inscrits)) |>
  group_by(Code.département) |>
  summarize(Pass_rate_2019 = mean(Pass_rate)) |>
  rename(Dep_number = Code.département) |>
  mutate(Dep_number = substr(Dep_number, 2, 3))

Middle <- Middle[(2:97), ]

Here we start by only taking the year 2019, we then only select useful columns, we then compute the Pass_rate which is the percentage of students who obtained the certificate. We then change the department number to be inline with our naming convention. Lastly, we select metropolitan departments.

Election results

Election_by_dep <- Election_dep |>
  select(c(1,28)) |>
  rename(Dep_number = `Code du département`,
         Lepen_score = `% Voix/Exp...28`) |>
  mutate(Win_lepen = ifelse(Lepen_score>50, 1, 0))

Election_by_dep <- Election_by_dep[1:96, ]

#rename dep_number
for (i in 1:nrow(Election_by_dep)) {
  if(nchar(Election_by_dep$Dep_number[i]) == 1) {
    Election_by_dep$Dep_number[i] <- paste0("0",  Election_by_dep$Dep_number[i])
  }
}

For our Election results, we select the columns we need. Then, we select metropolitan departments. Then we need to change our department number to our naming convention to be able to use it as a key. As a reminder, our naming convention is to write single digit department with a leading zero.

Immigration

Immig_2019 <- Immigration_2019 |>
  slice(1:96) |>
  select(c(1, 2, 4)) |>
  rename(Dep_name = `...1`,
         Immig_tot = Immigrés,
         pop_2019 = `Ensemble...4`) |>
  mutate(Immig_rate = Immig_tot/pop_2019) |>
  select(Dep_name, Immig_rate)

Immigration is quite html, we select only metropolitan departments with the slice() function. We then select columns that we need, rename them in english and compute the immigration rate which is Total number of immigrants divided by the total population.

Population density

Dens_2019 <- population_2019 |>
  rename(Dep_number = Département,
         Density_2019 = Densité) |>
  select(Dep_number, Density_2019) |>
  slice(1:96)

Similarly, population density is quite easy. we rename our selected columns to our naming convention, we then select only metropolitan departments using the slice() function.

Join them into a single dataset

Our last step is to join all of our previously cleaned datasets into a single dataset

Full_data_dep <- full_join(List_depart, Unemp, join_by("Dep_name")) |>
  full_join(Crime_2019, join_by("Dep_number")) |>
  full_join(Middle, join_by("Dep_number")) |>
  full_join(Election_by_dep, join_by("Dep_number")) |>
  full_join(Immig_2019, join_by("Dep_name")) |>
  full_join(Dens_2019, join_by("Dep_number"))

2.2.2 By Town

Crime

Type_per_year <- Crimes |> 
  filter(valeur.publiée == "diff") |>
  filter(!grepl("^97", CODGEO_2023)) |>
  select(CODGEO_2023, annee, classe, faits, tauxpourmille)

Type_per_year <- cbind(Type_per_year[1], Type_per_year)
colnames(Type_per_year) <- c("Departement", "Town", "Year", "Type", "Number", "Rate_per_1k")

Type_per_year <- Type_per_year |> 
  mutate(Departement = substr(Departement, 1, 2)) |>
  mutate(Rate_per_1k = str_replace_all(Rate_per_1k, ",", ".")) |>
  mutate(Rate_per_1k = as.numeric(Rate_per_1k)) |>
  group_by(Departement, Town, Year, Number, Type) |>
  summarize(Rate_per_1k = mean(Rate_per_1k, na.rm = TRUE), ) |> #omit the missing values because some are missing even though they were indicated as published
  group_by(Departement, Town, Year, Type, Rate_per_1k) |>
  summarize(Number = sum(Number))

We use this code to get the data by type and by year for each town. First, we only select values indicated as publishing their data (“diff”). We then only take metropolitan departments, by filtering every row except those starting with 97(the indicator for outer seas department). We then duplicate the town code to create a Department column. we rename the columns for more clarity. we then create our department column by substr() the first 2 digits of the town code. We now need to compute the rate for a thousand. To do this, we first must replace every comma by a point, that is because European notation of numbers uses a comma instead of a point for decimal numbers. we then use *group_by() and summarize() to get the mean rate. and use the same functions to get the total number of crimes. This dataframe will be useful for our plots

Crime_2019_town <- Type_per_year |>
  filter(Year == 19) |>
  group_by(Departement, Town) |>
  summarize(Rate_per_1k = mean(Rate_per_1k)) |>
  mutate(Town = paste0(Departement, Town)) |>
  rename(Town_code = Town, Dep = Departement)

We then summarize the rate value by town to use in our regression

Density

Density_town <- Dense |>
  select(c(4, 11, 15, 19)) |>
  rename(Town_name = `X4`, 
         Town_code = `X11`,
         Pop_2012 = `X15`,
         Size = `X19`) |>
  filter(!str_detect(Town_code, "^97")) |>
  mutate(Density_2019 = Pop_2012/Size) |>
  select(Town_code, Town_name, Density_2019)

To do this, we first select the columns we need. In that case, the name of the town, the Town_code, the population and the size. We then rename them to our standard, only select metropolitan departments and compute the density(Population/Size) lastly, we select only our important columns.

Population

Pop_by_town <- Pop_2019 |>
  group_by(CODGEO) |>
  summarise(Total = sum(NB)) |>
  rename(Town_code = CODGEO, Total_pop = Total)
view(Pop_by_town)

The cleaning for this dataset is very html, we compute the sum of every age group and gender group to find the total number of inhabitants by town.

Population without a diploma

Pop_no_diploma <- Pop_16_no_diploma |>
  select(c(2, 3, 6, 8, 10)) |>
  slice(-1) |>
  rename(Dep_number = `Département\nen géographie courante`,
         Town_number = `Commune\nen géographie courante`,
         Town_name = `Libellé de commune`,
         no_diploma_M = `Aucun diplôme\nHommes\n25 ans ou plus\nRP2019`,
         no_diploma_F = `Aucun diplôme\nFemmes\n25 ans ou plus\nRP2019`) |>
  group_by(Dep_number, Town_number) |>
  summarize(no_diploma = sum(as.numeric(no_diploma_F), as.numeric(no_diploma_M), na.rm = TRUE)) |>
  mutate(Town_code = paste0(Dep_number, Town_number)) |>
  select(Dep_number, Town_code, no_diploma) |>
  filter(!str_detect(Dep_number, "^97")) |>
  left_join(Pop_by_town, join_by(Town_code)) |>
  na.omit() |>
  mutate(No_diploma_rate1k = (no_diploma/Total_pop)*1000)

Paris_tot <- Pop_no_diploma |>
  filter(Dep_number == "75") |>
  summarize(No_diploma_rate1k = mean(No_diploma_rate1k)) |>
  rename(Town_code = Dep_number) |>
  mutate(Town_code = "75056")

Pop_no_diploma <- Pop_no_diploma |>
  group_by(Town_code, No_diploma_rate1k) |>
  select(Town_code, No_diploma_rate1k)

Pop_no_diploma <- rbind(Pop_no_diploma, Paris_tot)

To have our final data, we first select our needed columns. we then remove the first row and rename the columns. we then compute the total number of people without a diploma and make sur to remove any NA values. we then select only metropolitan departments, with filter() and str_detect. We then compute a left join with the dataframe Pop_by_town to get the population by town. afterwards, we compute the rate at which people do not have a diploma for a thousand people by dividing the total number of people without any diploma by the population of the town and multiply it by a thousand.

Lastly, we have noticed that this dataset provides us with data from every subdivision in the city of Paris, while this level of precision could be welcome, all of our other datasets counts Paris as a single city. As such, we must aggregate our rate for the city of Paris. To do this, we create a new df called Paris_tot, we then summarize all of it. We add the row at the end of our previous dataframe. It is unordered, but will become ordered when we use the join function later on.

2017 election results.

Vote_2017 <- Presidentielle_2017_Resultats_Communes_Tour_2_c |>
  select(c(1, 2, 3, 25, 32)) #select ony the values important to us in that case, the department, the towncode and the result
Vote_2017 <- Vote_2017[1:35281, ]
colnames(Vote_2017) <- c("Department", "Dep_name", "Town_num", "Macron", "Lepen")

#Need to add the Corse departement number(2A/2B)
Vote_2017$Department <- as.character(Vote_2017$Department)

for (i in 1:nrow(Vote_2017)) {
  if (Vote_2017$Dep_name[i] == "Corse-du-Sud") {
    Vote_2017$Department[i] <-  "2A"
  }
  if (Vote_2017$Dep_name[i] == "Haute-Corse") {
    Vote_2017$Department[i] <-  "2B"
  }  
}

#Now let's standardize the notation of departments and town codes by adding zeroes before.
for (i in 1:nrow(Vote_2017)) {
  if (nchar(Vote_2017$Department[i]) == 1) {
    Vote_2017$Department[i] <- paste0("0", Vote_2017$Department[i])
  }
  if (nchar(Vote_2017$Town_num[i])<=2) {
    if (nchar(Vote_2017$Town_num[i]) <= 1) { 
           Vote_2017$Town_num[i] <- paste0("00", Vote_2017$Town_num[i])
    }
           else {Vote_2017$Town_num[i] <- paste0("0", Vote_2017$Town_num[i])
           }
  }
}

#Create the town key
Vote_2017 <- Vote_2017 |> 
  mutate(Town_code = paste0(Department, Town_num)) |>
  mutate(Win_Lepen = ifelse(Lepen>50, 1, 0)) |>
  select(Department, Town_code, Macron, Lepen, Win_Lepen)#Create a dummy variable that equals 1 if LePen "won" the town (scored more than 50%)

we first select our needed columns. then, we only select metropolitan departments and rename our columns. the next step is to code the Corsica departments numbers. Corsica is split into 2 departments. 2A: “Corse-du-Sud” and 2B: “Haute-Corse”. these 2 departments are found between the department 19 and department 21. They used to be one single department butt were split in 2 in 1975. In our database, only the name can be found. we need the number. we use a for loop to code it into the Departement column. We then standardize our Departments numbers and town codes by adding leading zeroes. This then allows us to create our town_number key. Finally, we code a dummy variable that equals 1 if Marine Lepen had more than 50% of the votes in a town.

Poverty

Povr_2019 <- Pov_2019 |>
  select(c(1, 4, 6)) |>
  na.omit() |>
  rename(Town_code = CODGEO,
         Povrety_2019 = TP6019,
         Intensity_povrety = TP60IP19)

Poverty is quite html. we select our columns, rename them and we do not forget to omit values that were blank

Join them into a single dataframe

Finally, we join all of those datasets into a single one. To do this, we first use full_join() to see how values are missing.

Everything_by_town <- Crime_2019_town |>
  full_join(Pop_by_town, join_by(Town_code)) |> 
  full_join(Density_town, join_by(Town_code)) |>
  full_join(Vote_2017, join_by(Town_code)) |>
  full_join(Povr_2019, join_by(Town_code)) |>
  full_join(Pop_no_diploma, join_by(Town_code))
summary(Everything_by_town)

As we can see, Poverty is a smaller dataset. This may be a problem, but upon further inspection, we see that the smallest population is 1’700 inhabitants. And we still have around 5’000 observations. We previously thought about removing every observation where the number of inhabitants of a town was smaller than 200, because of the poor statistical analysis that would stem from such extreme values. the INSEE institute were most of our data has been collected recommends not doing any analysis on towns of less than 200 inhabitants. As such, the reduced number of observations with the poverty dataset is not a problem.

Everything_by_town_clean <- Everything_by_town |>
  na.omit() |>
  select(Dep_number, 
         Town_code, 
         Town_name, 
         Total_pop, Rate_per_1k, Density_2019, Lepen, Win_Lepen, Povrety_2019, Intensity_povrety, No_diploma_rate1k)

then, we omit our missing values, select only our very important values and we have our final dataset. that we may use for EDA and most importantly for our Regression analysis. Were we will predict the crime rate of a town using our independent variables.

2.3 Final results

We finally have these two datasets.

The Department level

Dep_number Dep_name Unemp_2019 Crime_tot_2019 pop_2019 Crime_rate_1k Pass_rate Lepen_score Win_lepen Immig_rate Density_2019
01 Ain 6.075 27831 652432 42.65732 0.8701494 39.06 0 0.1210364 113
02 Aisne 11.650 26577 531345 50.01835 0.8700065 52.91 1 0.0478296 72
03 Allier 8.900 14440 335975 42.97939 0.8657765 36.10 0 0.0518580 46
04 Alpes-de-Haute-Provence 9.775 8831 164308 53.74662 0.8419413 41.54 0 0.0810003 24
05 Hautes-Alpes 7.975 6135 141220 43.44286 0.8661628 35.88 0 0.0580088 25
06 Alpes-Maritimes 8.750 74434 1094283 68.02080 0.9132394 44.65 0 0.1482542 255

The town level

X Dep_number Town_code Town_name Total_pop Rate_per_1k Density_2019 Lepen Win_Lepen Povrety_2019 Intensity_povrety No_diploma_rate1k
1 01 01004 AMBERIEU-EN-BUGEY 14134 4.658571 562.39837 41.11 0 28 0.385 106.40581
2 01 01007 AMBRONAY 2800 5.020878 70.40238 43.71 0 13 0.276 42.85714
3 01 01014 ARBENT 3349 8.697631 147.97786 32.11 0 19 0.277 201.31667
4 01 01024 ATTIGNAT 3251 3.080986 157.08935 41.36 0 12 0.274 46.58719
5 01 01025 BAGE-LA-VILLE 4065 1.687562 76.61290 41.31 0 11 0.305 66.42066
6 01 01031 BELLIGNAT 3652 6.283417 451.72414 37.29 0 29 0.389 191.67903

3. Visualization

3.1 Crime

3.1.1 Evolution of the total crime rate

In this graphic we chose to compute the crime rate per hundred persons in the 5 departments with the biggest and the 5 with the lowest crime rate.

In this graphic we chose to compute the crime rate per hundred persons in the 5 departments with the highest crime rate and those with the lowest crime rate. The first thing we can observe is the major disparity between the average crime number in those departments. In fact the highest crime rate in the Paris is almost 5 times higher than the Aveyron 30.85 vs 145.25 It’s important to note that the departments with the higher crime rate are the one encompassing the biggest cities in France: Paris and it’s suburbs of “Seine-Saint-Denis” and “Hauts-de-Seine”, Marseille is in the “Bouches-du_Rhône”, Lyon is located in the “Rhône”. We can start to understand that density may be a key factor in determining crime rate.

Evolution of Crime in the 5 most criminal departments

Here we can observe that the total crimes in the most criminal departments are lower than what they were twenty years ago in the city of Paris (75) and Marseille (13) but are sensibly the same or higher in the peripheral areas of Paris (92/93) and in the city of Lyon (69). This can possibly be explained by the huge increase of density of population in the peripheral areas of big cities in France.

This graph represents the annual amount of crimes in the 5 least criminal departments. Here we can assume that the number of crimes in the safest departments is quite stable through the years. We also observe the small variation due to COVID-19 Pandemic

3.1.2 Types of crime commited

We start by looking at the evolution of the most common types of crime by year in France.

Here we can see that the most common crime in France is deliberate destruction and damage. The effect of the COVID-19 pandemic can also be observed as most crimes have seen an overall drop in 2020. For example, theft without violence saw a sharp drop and its levels have not gone back to 2019 levels of theft.

The Types of crime committed also depend on the location.

In these graphs, we can see the difference in the types of crime committed in the Paris department and in the Cher department. Cher is sparsely populated department. Here we observe that in Paris, theft without violence is the number one committed crime. This is surely due to Paris being a big city with many tourists and as such many pickpockets. This may not be the case in the Cher region with only 300’000 inhabitants. In the Cher, theft is less prevalent with the biggest Crime rate being deliberate destruction and damage. Followed by Burglaries.

3.2 Unemployment dataset

3.2.1 Evolution of the unemployment rate

Let’s now take a look at our first variable: Unemployment. Unemployment in France has evolved through time and has usually been higher than unemployment in Switzerland.

We can observe a huge increase in the unemployment rate in the 90’s in France. As we know after the Second World War, European countries had an important period of economic growth, this period is known as the « 30 glorieuses ». This period matched with an important demand of workers, from Europe and from other continents. There was a need to rebuild the country and an important diversification of the jobs with help of technological progress. In the early 80’s, an important recession started in Europe, with a decrease of job creation in most of the important European countries, especially Germany, Great Britain and France. The the unemployment rate was approximately increasing by 1% a year in France in 1990, to reach the record rate of 11% in 1997. This is mainly explained by the two oil shocks in the 70’s and 80’s. The French president at the time, Francois Mitterand said in 1993 : « In terms of unemployment, we’ve tried everything and we’ve failed everything ». Following this recession France had an important decrease of unemployment rate, the explanations are multiple but the most plausible is the creation of the Euro zone and currency. This tendency lasted until the major financial crisis of 2008. In the last decade the unemployment rate has been mastered once again until the COVID 19 pandemic in 2020 which created a massive suppression of jobs in various areas such as the show industry or the food service sector.

3.2.2 Unemployment by department

We can observe with this graph that there’s an important disparity in unemployment rates in France. It’s also relevant that one of the departments with the higher crime rate which we’ve seen before, is also one of the departments with the higher crime rates (93). This may be an indication of what we could possibly find with the statistical regressions.

3.3 Maps

This map compute the crime rate in every French department and shows us the important disparity in the amount of crimes committed across the country.

This map shows us the score in the second round of the French presidential election of 2017, which we choose to compute as a possible explanation variable. As we know Emmanuel Macron has been elected president with 66.10% so the departments overall voted for him, however it is interesting to see the departments where Marine LePen had an important share of the votes. Here those departments are represented with a darker color.

here this map shows the school pass rate for each department at the end of middle school. Unfortunately, no clear trend seems to emerge from the map. Futhermore, the difference between the highest value (0.9305) and the lowest value (0.8274) is only of 10%. This means that all other values are contained within only 10%. Showing them on a map with colors tends to over-exagerate the differences between values. As such, this map has poor vizualization power and may even be considered somewhat misleading. It is still left here in the intermediary report, but may be removed from the final report.

Lastly, this map shows us th immigration rate, be department. As we can see, Immigrants are mostly located in the vicinity of the Paris region. Other notable areas of higher immigration are border regions in the south and in the east.

4. Clustering

We will now run a cluster analysis of each department, we want to see if some departments have simmilarities.

First, we compute the correlation matrix.

Unemp_2019 Crime_rate_1k Pass_rate Lepen_score Immig_rate Density_2019
Unemp_2019 1.0000000 0.2554038 -0.4734385 0.5346616 0.1253533 -0.0838700
Crime_rate_1k 0.2554038 1.0000000 -0.2234582 -0.2514767 0.6862923 0.7659192
Pass_rate -0.4734385 -0.2234582 1.0000000 -0.3175815 -0.2521074 0.0489446
Lepen_score 0.5346616 -0.2514767 -0.3175815 1.0000000 -0.2983429 -0.4813112
Immig_rate 0.1253533 0.6862923 -0.2521074 -0.2983429 1.0000000 0.5752951
Density_2019 -0.0838700 0.7659192 0.0489446 -0.4813112 0.5752951 1.0000000

As we can see, Density is highly correlated with Crime Rates. In contrast, Density and unemployment have a very low correlation. Other interesting Correlations are the one with Immigration and Crime rates, Unemployment and the Score of Marine Lepen.

4.1 Hierarchical clustering

We now compute the distance matrix. Using this distance matrix, we run a hierarchical clustering analysis using the average distance method.

we get the following dendrogram.

This graph tells us that observation 76 stands out from the rest, Observation 76 is actually Paris. Paris is a clear outlier. Based on this graph we could try to use 3 clusters.

We can now visualize this using our 2 most correlated variables, Crime rates and Density. ::: {.cell} ::: {.cell-output-display} ::: :::

This is quite interesting, our 3 clusters are Paris, the Paris region and the rest of France. To further show this, we can see this in a map format. ::: {.cell} ::: {.cell-output-display} ::: :::

4.2 K-means clustering

To further our analysis and chose a number of clusters, we will run K-means analysis with n-start = 25. Our data will also be scaled to reduce differences. This approach is what is recommended, we will run the same clustering 25 times with different centers as we initialize. Running it 25 times will ensure that we get the best clusters we can.

We get the followig scree plot from our analysis: ::: {.cell} ::: {.cell-output-display} ::: ::: as we can tell, our optimal number of clusters seems to be 6 clusters.

We’ll visualize this using a cluster plot and a facet wrap of Lepen Scores and Unemployment rate, we previously say that the correlation was decently high. ::: {.cell} ::: {.cell-output-display} :::

:::

As before, we will map it for more clarity. ::: {.cell} ::: {.cell-output-display} ::: :::

Overall, during this clustering exercise, our main learning is that Paris and it’s suburbs are very big compared to the rest of France, so much that it tends skew the analysis. For our final report, it could be interesting to remove Paris and the prarisian region to be able to conduct an analysis of the rest of our departments

X. RoadMap

What is left to do for our report?

As things stand, we still need to run our regression and analyse its results. We also need to answer our research questions.

If possible we would also like to run a PCA analysis on the different types of crimes to see if certain states experience a type of crime more often than others, and if some crimes can be related.

We would also like to tweak various parts of our report, for example, we still need to add more graphs and at least one interactive map using the leaflet package.

We would also like to develop our EDA section by adding more information on our raw databases, with information regarding the missing values or the distribution of values.